BUSINESS QUESTIONS: Note: ignore the fact there is no question 1, this is due to a formatting error 2. a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region? b) Analysis on the types of animals that are injured, this also by Region – is there a species that is more liable to injury in certain regions? - (cat/dog by region) c) What is the outcome? Does this differ by region?

  1. Total call volume for complaint calls: How has this trended over time?
  2. Is there a particular animal being called about the most?
  3. Do particular suburbs have different type of complaint calls? Do they call about different animals? ((MAKE A LEAFLET MAP FOR THIS!))
  1. Business Intelligence – using the insights you have found, can you predict how this might look for the upcoming year?
library(tidyverse)
library(tsibble)

Attaching package: ‘tsibble’

The following object is masked from ‘package:lubridate’:

    interval

The following objects are masked from ‘package:base’:

    intersect, setdiff, union
library(forecast)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
This is forecast 8.20 
  Stackoverflow is a great place to get help on R issues:
  http://stackoverflow.com/tags/forecasting+r.
source("cleaning_script.R")
Rows: 31330 Columns: 7── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr (6): nature, animal_type, category, suburb, date_range, city
lgl (1): responsible_office
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5643 rows [7, 11, 12, 18, 19, 25, 27, 29, 30, 31, 39, 44, 52, 54, 66, 67, 74, 75, 78, 81, ...].Rows: 42413 Columns: 5── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Animal Type, Complaint Type, Date Received, Suburb, Electoral Division
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Rows: 664 Columns: 12── Column specification ─────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): animal_type, outcome
dbl (10): year, ACT, NSW, NT, QLD, SA, TAS, VIC, WA, Total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Note; summer in Australia seasons: Summer: December - February Autumn: March - May Winter: June - August Spring: September - November

also: The ‘wet season’ in Australia’s North: November - April

https://www.mdpi.com/2076-2615/8/7/100 will useful reading for later, talk about the need to reduce euthanasia or similar and the effect this has on people

Intro: Introduce the point of the talk, talk about Australian RSPCA, talk about how the data was gathered (using the information from the websites), the PURPOSE of this investigation (which is to help the RSPCA know which areas/animals to focus their efforts on), as I’m introducing the datasets I can introduce the two different cities. First, we can talk a short bit about Australia as a whole, it’s climate, the kind of animals etc. The point here is to really set the scene before diving too deep into facts and figures, as especially a non-technical audience this will help keep them engaged and make a more holistic presentation. In my opinion, it’s always good to zoom out and see the big picture, rather than getting lost in the myopia of some csv files.

Townsville Intro: Townsville is a city on the north-eastern coast of Queensland, Australia. With a population of 180,820 as of June 2018, it is the largest settlement in North Queensland; it is unofficially considered its capital. [note: put a map of Australia with Queensland and Townsville highlighted here. Talk a little bit about the population density, urbanisation, climate, types of animals that are common here etc. Show some photos of the area too]

Brisbane Intro: Exact same as above

3.

  1. Total call volume for complaint calls: How has this trended over time?

First, let’s look at the Townsville animal complaints.

animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_line() +
  scale_x_date(date_breaks = "6 months", date_labels = "%b-%y")

This shows some seasonality and also an increase then a decline. Each summer (in December) the calls are much lower, rising again each Winter. Using geom_smooth, we get:

animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_smooth() +
  geom_point() +
  scale_x_date(date_breaks = "1 year", date_labels = "%y")

This shows a bit more clearly the general trend of call volume. From 2014 it steadily rises, peaking in 2017. Afterwards, it steadily declines to only slightly higher than where it started. It would be very difficult to say whether this trend will continue downward, go upward or stay relatively flat.

Note; if we have time or if it’s helpful, we will fix this graph so that quarters are properly displayed.

brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_line() +
  scale_x_date(date_breaks = "3 months", date_labels = "%y")

Again, we see the seasonality of winter having more calls. The fact this is in both Brisbane and Townsville suggests a fairly general trend.

brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_smooth() +
  scale_x_date(date_breaks = "1 year", date_labels = "20%y")

Other than declining a little over 2016, the number of calls sees a slow but steady increase towards 2020, being thousands more than it was in 2016 and 2017. So, the general trend is that the RSPCA are getting more complaints as time goes on. Now, this does not necessarily mean the line will continue to go up. We are also missing Q3 from 2016 so this skews the curve a little

3.

  1. Is there a particular animal being called about the most?
animal_complaints %>% 
  group_by(animal_type) %>% 
  ggplot(aes(x = animal_type)) +
  geom_bar()

The number of calls about dogs dwarf those about cats hugely. Let’s look at specific numbers:

animal_complaints %>% 
  group_by(animal_type) %>%
  count() %>% 
  summarise(n, 4094 / 38319)

So cats account for only 10% of the calls dogs do for Townsville! Let’s look at Brisbane:

brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  count()

brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  ggplot(aes(x = type_of_animal)) +
  geom_bar()

We have a lot more animal types here that we don’t in the previous data. We also have many calls about Attacks with no animal specified, many of which it is very likely they were involving dogs. However, we can’t say this for sure.

So that makes

4745 / 13334  # cats are 35% of the calls compared to dogs
[1] 0.3558572
# all other animals divided by dogs, leaving out unspecified (which we believe potentially contain a high proportion of dogs)
9457 / 13349
[1] 0.7084426

Most of the other animals have very small counts, interestingly foxes seem to make up a decent proportion of the calls regarding wild animals.

To answer the question however, it’s mainly dogs and cats, and especially dogs. Dogs are being called about more than any of the other animals combined (if we leave Unspecified to the side)

(Attack refers to the initial description of the complaint)

3.

  1. Do particular suburbs have different type of complaint calls? Do they call about different animals?

OK, so this one is difficult simply for the fact there is a huge amount of suburbs.

brisbane_complaints %>% 
  group_by(suburb) %>% 
  count()

# There is 192 suburbs! Certainly using a fill on a graph is not gonna work, neither is putting them on the x-axis on a bar graph.

animal_complaints %>% 
  group_by(suburb) %>% 
  count()
# 85 suburbs for Townsville

animal_complaints %>% 
  group_by(electoral_division) %>% 
  count()
# also 11 electoral divisions. Could we look at suburbs one electoral division at a time? Possibly

Idea: What about using leaflet to visualise types of complaint calls on a map?

This is more feasible with the Townsville datasets, as it has less suburbs and only 6 types of complaints. It may be necessary to try and wrangle the Brisbane data a little further to narrow down categories (for both type_of_animal and complaint_type)

FOR TOWNSVILLE WE HAVE: 85 different suburbs 6 complaint types 2 animal types

We want to break it down by suburb and complaint type, and then by suburb and animal type

animal_complaints %>% 
  ggplot(aes(x = suburb, fill = animal_type)) +
  geom_bar(position = "fill") +
  coord_flip() +
  scale_x_discrete(guide = guide_axis(n.dodge = 5))

animal_complaints %>% 
  ggplot(aes(x = suburb, fill =complaint_type)) +
  geom_bar(position = "fill") 

  #coord_flip() +
  #scale_x_discrete(guide = guide_axis(n.dodge = 5))

The challenge again, is having too many suburbs that we can’t get any useful information out of the graphs

animal_complaints %>% 
  group_by(suburb) %>% 
  count(sort = TRUE)

# let's drop any suburbs with less than 200 cases

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col() +
  coord_flip()
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

The highest count by quite a large margin is unallocated to a specific suburb.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

At a glance, there’s no big difference on animal types. All the suburbs have a large majority of dogs. Could we do a hypothesis test to see if there is a statistically significant difference?

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

Looking at the lower end of the population, we see an outlier. Townsville City has way more cats than any of the other suburbs. In fact, it’s almost 50/50!

Now still looking at the Townsville dataset, we’ll break it down by complaint type:

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500 & count <= 4000) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

Now for the lower end of the count:

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 50) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

I’m not sure much can be gleamed with so many complaint_types. Let’s try focusing more specifically.

One thing to note here, is the vast variation in tolerance for noise. Most categories are consistent except this one. Cluden for example has a small percentage of noise complaints,while Bohle Plains has a huge percentage. This requires more investigation to figure out the root cause of this:

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

For the less than 500 greater than 50 groups:

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>%  
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  #filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() +
  facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

From this graph, we can see that Hyde Park has a disproportionate amount of private impounds, while Bohle plains has more noise complaints.

Potential hypothesis test:

That the level of private impounds in Hyde Park being greater is statistically significant.

That the level of noise in Bohle Plains being greater is statistically significant.

animal_complaints %>% 
  group_by(suburb) %>% 
  count() %>% 
  filter(n < 500 & n > 100)

Let’s look at the suburbs with much higher complaints now:

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count >= 500 & count <4000) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
`summarise()` has grouped output by 'suburb'. You can override using the `.groups` argument.

2. a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region?

It’s impossible to say if there’s different peaks of times of yer per region, as we only have the data for the year as a whole. Unless we look at Suburb rather than region, and break this down by time. However, I think it’d be good to use the general Autralia (nationwide) data.

animal_outcomes %>% 
  group_by(region) %>% 
  ggplot(aes(x = region, y = number_of_occurences, fill = outcome, col = outcome)) +
  geom_col(position = "fill")

animal_outcomes %>% 
  group_by(region, year) %>% 
  summarise(count = n())
`summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
---
title: "Analysis"
output: html_notebook
---

BUSINESS QUESTIONS:
Note: ignore the fact there is no question 1, this is due to a formatting error
2. 
a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region?
b) Analysis on the types of animals that are injured, this also by Region – is there a species that is more liable to injury in certain regions? - (cat/dog by region)
c) What is the outcome? Does this differ by region?

3.
a) Total call volume for complaint calls: How has this trended over time?
b) Is there a particular animal being called about the most?
c) Do particular suburbs have different type of complaint calls? Do they call about different animals? ((MAKE A LEAFLET MAP FOR THIS!))

4.
Business Intelligence – using the insights you have found, can you predict how this might look for the upcoming year?

```{r}
library(tidyverse)
library(tsibble)
library(forecast)
source("cleaning_script.R")
```

Note; summer in Australia seasons:
Summer: December - February
Autumn: March - May
Winter: June - August
Spring: September - November

also: The 'wet season' in Australia's North: November - April

https://www.mdpi.com/2076-2615/8/7/100   will useful reading for later, talk about the need to reduce euthanasia or similar and the effect this has on people

Intro:
Introduce the point of the talk, talk about Australian RSPCA, talk about how the data was gathered (using the information from the websites), the PURPOSE of this investigation (which is to help the RSPCA know which areas/animals to focus their efforts on), as I'm introducing the datasets I can introduce the two different cities. First, we can talk a short bit about Australia as a whole, it's climate, the kind of animals etc. The point here is to really set the scene before diving too deep into facts and figures, as especially a non-technical audience this will help keep them engaged and make a more holistic presentation. In my opinion, it's always good to zoom out and see the big picture, rather than getting lost in the myopia of some csv files.

Townsville Intro:
Townsville is a city on the north-eastern coast of Queensland, Australia. With a population of 180,820 as of June 2018, it is the largest settlement in North Queensland; it is unofficially considered its capital. [note: put a map of Australia with Queensland and Townsville highlighted here. Talk a little bit about the population density, urbanisation, climate, types of animals that are common here etc. Show some photos of the area too]

Brisbane Intro:
Exact same as above

### 3.
a) Total call volume for complaint calls: How has this trended over time?

First, let's look at the Townsville animal complaints.
```{r}
animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_line() +
  scale_x_date(date_breaks = "6 months", date_labels = "%b-%y")
```
This shows some seasonality and also an increase then a decline. Each summer (in December) the calls are much lower, rising again each Winter. Using geom_smooth, we get:

```{r}
animal_complaints %>% 
  group_by(date_received) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date_received, y = count)) +
  geom_smooth() +
  geom_point() +
  scale_x_date(date_breaks = "1 year", date_labels = "%y")
```

This shows a bit more clearly the general trend of call volume. From 2014 it steadily rises, peaking in 2017. Afterwards, it steadily declines to only slightly higher than where it started. It would be very difficult to say whether this trend will continue downward, go upward or stay relatively flat.







Note; if we have time or if it's helpful, we will fix this graph so that quarters are properly displayed.
```{r}
brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_line() +
  scale_x_date(date_breaks = "3 months", date_labels = "%y")
```

Again, we see the seasonality of winter having more calls. The fact this is in both Brisbane and Townsville suggests a fairly general trend.

```{r}
brisbane_complaints %>% 
  group_by(date) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = date, y = count)) +
  geom_point() +
  geom_smooth() +
  scale_x_date(date_breaks = "1 year", date_labels = "20%y")
```

Other than declining a little over 2016, the number of calls sees a slow but steady increase towards 2020, being thousands more than it was in 2016 and 2017. So, the general trend is that the RSPCA are getting more complaints as time goes on. Now, this does not necessarily mean the line will continue to go up. We are also missing Q3 from 2016 so this skews the curve a little



### 3.
b) Is there a particular animal being called about the most?

```{r}
animal_complaints %>% 
  group_by(animal_type) %>% 
  ggplot(aes(x = animal_type)) +
  geom_bar()
```

The number of calls about dogs dwarf those about cats hugely. Let's look at specific numbers:
```{r}
animal_complaints %>% 
  group_by(animal_type) %>%
  count() %>% 
  summarise(n, 4094 / 38319)
```

So cats account for only 10% of the calls dogs do for Townsville! Let's look at Brisbane:

```{r}
brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  count()

brisbane_complaints %>% 
  group_by(type_of_animal) %>%
  ggplot(aes(x = type_of_animal)) +
  geom_bar()
```

We have a lot more animal types here that we don't in the previous data. We also have many calls about Attacks with no animal specified, many of which it is very likely they were involving dogs. However, we can't say this for sure.

So that makes 
```{r}
4745 / 13334  # cats are 35% of the calls compared to dogs



# all other animals divided by dogs, leaving out unspecified (which we believe potentially contain a high proportion of dogs)
9457 / 13349
```

Most of the other animals have very small counts, interestingly foxes seem to make up a decent proportion of the calls regarding wild animals.

To answer the question however, it's mainly dogs and cats, and especially dogs. Dogs are being called about more than any of the other animals combined (if we leave Unspecified to the side)

(Attack refers to the initial description of the complaint)


### 3. 
c) Do particular suburbs have different type of complaint calls? Do they call about different animals?

OK, so this one is difficult simply for the fact there is a huge amount of suburbs.
```{r}
brisbane_complaints %>% 
  group_by(suburb) %>% 
  count()

# There is 192 suburbs! Certainly using a fill on a graph is not gonna work, neither is putting them on the x-axis on a bar graph.

animal_complaints %>% 
  group_by(suburb) %>% 
  count()
# 85 suburbs for Townsville

animal_complaints %>% 
  group_by(electoral_division) %>% 
  count()
# also 11 electoral divisions. Could we look at suburbs one electoral division at a time? Possibly
```

Idea: What about using leaflet to visualise types of complaint calls on a map?

This is more feasible with the Townsville datasets, as it has less suburbs and only 6 types of complaints. It may be necessary to try and wrangle the Brisbane data a little further to narrow down categories (for both type_of_animal and complaint_type)

FOR TOWNSVILLE WE HAVE:
85 different suburbs
6 complaint types
2 animal types

We want to break it down by suburb and complaint type, and then by suburb and animal type
```{r}
animal_complaints %>% 
  ggplot(aes(x = suburb, fill = animal_type)) +
  geom_bar(position = "fill") +
  coord_flip() +
  scale_x_discrete(guide = guide_axis(n.dodge = 5))
```

```{r}
animal_complaints %>% 
  ggplot(aes(x = suburb, fill =complaint_type)) +
  geom_bar(position = "fill") 
  #coord_flip() +
  #scale_x_discrete(guide = guide_axis(n.dodge = 5))
```

The challenge again, is having too many suburbs that we can't get any useful information out of the graphs

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  count(sort = TRUE)

# let's drop any suburbs with less than 200 cases

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col() +
  coord_flip()
```

The highest count by quite a large margin is unallocated to a specific suburb.

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count >= 500) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```
At a glance, there's no big difference on animal types. All the suburbs have a large majority of dogs. Could we do a hypothesis test to see if there is a statistically significant difference?

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = suburb, y = count, fill = animal_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```
Looking at the lower end of the population, we see an outlier. Townsville City has way more cats than any of the other suburbs. In fact, it's almost 50/50!

Now still looking at the Townsville dataset, we'll break it down by complaint type:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500 & count <= 4000) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

Now for the lower end of the count:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 50) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

I'm not sure much can be gleamed with so many complaint_types. Let's try focusing more specifically.

One thing to note here, is the vast variation in tolerance for noise. Most categories are consistent except this one. Cluden for example has a small percentage of noise complaints,while Bohle Plains has a huge percentage. This requires more investigation to figure out the root cause of this: 

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count >= 500) %>% 
  filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 
```

For the less than 500 greater than 50 groups:
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Noise" | (complaint_type == "Attack")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>%  
  filter(complaint_type == "Noise" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  filter(complaint_type == "Wandering" | (complaint_type == "Aggressive Animal")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() 

animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type) %>% 
  filter(count < 500 & count > 100) %>% 
  #filter(complaint_type == "Enclosure" | (complaint_type == "Noise")) %>% 
  ggplot(aes(x = suburb, y = count, fill = complaint_type)) +
  geom_col(position = "fill") +
  coord_flip() +
  facet_wrap(~ suburb)
```


```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count < 500 & count > 100) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
```

From this graph, we can see that Hyde Park has a disproportionate amount of private impounds, while Bohle plains has more noise complaints.

Potential hypothesis test:

That the level of private impounds in Hyde Park being greater is statistically significant.

That the level of noise in Bohle Plains being greater is statistically significant.
```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  count() %>% 
  filter(n < 500 & n > 100)
```






Let's look at the suburbs with much higher complaints now:

```{r}
animal_complaints %>% 
  group_by(suburb) %>% 
  summarise(count = n(), complaint_type, animal_type) %>% 
  filter(count >= 500 & count <4000) %>% 
  ggplot(aes(x = complaint_type, y = count, fill = complaint_type)) +
  geom_col() +
  facet_wrap(~ suburb)
```





### 2. a) Is there are a trend in injuries By Region, and is there different peaks of times of year per region?

It's impossible to say if there's different peaks of times of yer per region, as we only have the data for the year as a whole. Unless we look at Suburb rather than region, and break this down by time. However, I think it'd be good to use the general Autralia (nationwide) data.









```{r}
animal_outcomes %>% 
  group_by(region) %>% 
  ggplot(aes(x = region, y = number_of_occurences, fill = outcome, col = outcome)) +
  geom_col(position = "fill")
```


```{r}
animal_outcomes %>% 
  group_by(region, year) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x = year, y = count)) +
  #geom_point() +
  geom_line() +
  facet_wrap(~ region)
```



















